Masaru OYA Masao YANAGISAWA Nozomu TOGAWA
Modern digital integrated circuits (ICs) are often designed and fabricated by third parties and tools, which can make IC design/fabrication vulnerable to malicious modifications. The malicious circuits are generally referred to as hardware Trojans (HTs) and they are considered to be a serious security concern. In this paper, we propose a logic-testing based HT detection and classification method utilizing steady state learning. We first observe that HTs are hidden while applying random test patterns in a short time but most of them can be activated in a very long-term random circuit operation. Hence it is very natural that we learn steady signal-transition states of every suspicious Trojan net in a netlist by performing short-term random simulation. After that, we simulate or emulate the netlist in a very long time by giving random test patterns and obtain a set of signal-transition states. By discovering correlation between them, our method detects HTs and finds out its behavior. HTs sometimes do not affect primary outputs but just leak information over side channels. Our method can be successfully applied to those types of HTs. Experimental results demonstrate that our method can successfully identify all the real Trojan nets to be Trojan nets and all the normal nets to be normal nets, while other existing logic-testing HT detection methods cannot detect some of them. Moreover, our method can successfully detect HTs even if they are not really activated during long-term random simulation. Our method also correctly guesses the HT behavior utilizing signal transition learning.
Saki TAJIMA Nozomu TOGAWA Masao YANAGISAWA Youhua SHI
To deal with the reliability issue caused by soft errors, this paper proposed a low power soft error hardened latch (SHC) design using a novel Schmitt-Trigger-based C-element for reliable low power applications. Unlike state-of-the-art soft error tolerant latches that are usually based on hardware redundancy with large area overhead and high power consumption, the proposed SHC latch is implemented through double-sampling and node-checking using a novel Schmitt-Trigger-based C-element, which can help to reduce the area overhead and the corresponding power consumption as well. The evaluation results show that the total number of transistors of the proposed SHC latch is only increased by 2 when compared to the conventional unhardened C2MOS latch, while up to 20.35% and 82.96% power reduction can be achieved when compared to the conventional unhardened C2MOS latch and the existing soft error tolerant HiPeR design, respectively.
Tatsuro KOJO Masashi TAWADA Masao YANAGISAWA Nozomu TOGAWA
Non-volatile memories are a promising alternative to memory design but data stored in them still may be destructed due to crosstalk and radiation. The data stored in them can be restored by using error-correcting codes but they require extra bits to correct bit errors. One of the largest problems in non-volatile memories is that they consume ten to hundred times more energy than normal memories in bit-writing. It is quite necessary to reduce writing bits. Recently, a REC code (bit-write-reducing and error-correcting code) is proposed for non-volatile memories which can reduce writing bits and has a capability of error correction. The REC code is generated from a linear systematic error-correcting code but it must include the codeword of all 1's, i.e., 11…1. The codeword bit length must be longer in order to satisfy this condition. In this letter, we propose a method to generate a relaxed REC code which is generated from a relaxed error-correcting code, which does not necessarily include the codeword of all 1's and thus its codeword bit length can be shorter. We prove that the maximum flipping bits of the relaxed REC code is still limited theoretically. Experimental results show that the relaxed REC code efficiently reduce the number of writing bits.
Youhua SHI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
In this paper, we presented a Design-for-Secure-Test (DFST) technique for pipelined AES to guarantee both the security and the test quality during testing. Unlike previous works, the proposed method can keep all the secrets inside and provide high test quality and fault diagnosis ability as well. Furthermore, the proposed DFST technique can significantly reduce test application time, test data volume, and test generation effort as additional benefits.
Koki ITO Kazushi KAWAMURA Yutaka TAMIYA Masao YANAGISAWA Nozomu TOGAWA
An (M,N)-field-data extractor reads out any consecutive N bytes from an M-byte register by connecting its input/output using a multiplexer (MUX) network. It is used in packet analysis and/or stream data processing for video/audio data. In this letter, we propose an efficient MUX network for an (M,N)-field-data extractor. By bi-partitioning a simple MUX network into an upper one and a lower one, we can theoretically reduce the number of required MUXs without increasing the MUX network depth. Experimental results show that we can reduce the gate count by up to 92% compared to a naive approach.
Nozomu TOGAWA Takafumi HISAKI Masao YANAGISAWA Tatsuo OHTSUKI
This paper proposes a high-level synthesis system for datapath design of digital processing hardwares. The system consists of four phases: (1) DFG (data-flow graph) generation, (2) scheduling, (3) resource binding, and (4) HDL (hardware description language) generation. In (1), the system does not generate only one best DFG representing a given behavioral description of a hardware, but more than one good DFGs representing it. In (2) and (3), several synthesis tools can be incorporated into the system depending on the required objectives. Thus we can obtain more than one datapath candidates for a behavioral description with their area and performance evaluation. In (4), the best datapath design is selected among those candidates and its hardware description is generated. The experimental results for applying the system to several benchmarks show the effectiveness and efficiency.
Yuri USAMI Kazuaki ISHIKAWA Toshinori TAKAYAMA Masao YANAGISAWA Nozomu TOGAWA
It becomes possible to prevent accidents beforehand by predicting dangerous riding behavior based on recognition of bicycle behaviors. In this paper, we propose a bicycle behavior recognition method using a three-axis acceleration sensor and three-axis gyro sensor equipped with a smartphone when it is installed on a bicycle handlebar. We focus on the periodic handlebar motions for balancing while running a bicycle and reduce the sensor noises caused by them. After that, we use machine learning for recognizing the bicycle behaviors, effectively utilizing the motion features in bicycle behavior recognition. The experimental results demonstrate that the proposed method accurately recognizes the four bicycle behaviors of stop, run straight, turn right, and turn left and its F-measure becomes around 0.9. The results indicate that, even if the smartphone is installed on the noisy bicycle handlebar, our proposed method can recognize the bicycle behaviors with almost the same accuracy as the one when a smartphone is installed on a rear axle of a bicycle on which the handlebar motion noises can be much reduced.
Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
In digital signal processing, bit width of intermediate variables should be longer than that of input and output variables in order to execute intermediate operations with high precision. Then a processor core for digital signal processing is required to have two types of register files, one of which is used by input and output variables and the other one is used by intermediate variables. This paper proposes a hardware/software cosynthesis system for digital signal processor cores with two types of register files. Given an application program and its data, the system synthesizes a hardware description of a processor core, an object code running on the processor core, and software environments. A synthesized processor core can be composed of a processor kernel, multiple data memory buses, hardware loop units, addressing units, and multiple functional units. Furthermore it can have two types of register files RF1 and RF2. The bit width and number of registers in RF1 or RF2 will be determined based on a given application program. Thus a synthesized processor core will have small area with keeping high precision of intermediate operations compared with a processor core with only one register file. The experimental results demonstrate the effectiveness of the proposed system.
Nozomu TOGAWA Koichi TACHIKAKE Yuichiro MIYAOKA Masao YANAGISAWA Tatsuo OHTSUKI
This letter proposes a new hardware/software partitioning algorithm for processor cores with SIMD instructions. Given a compiled assembly code including SIMD instructions and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with a new assembly code. Firstly, we assume for each operation type a super SIMD functional unit which can execute all the SIMD instructions. Secondly we reduce a SIMD instruction or "sub-function" of each super functional unit, one by one, while the timing constraint is satisfied. At the same time, we update the assembly code so that it can run on the new processor configuration. By repeating this process, we finally find SIMD functional unit configuration as well as a processor core architecture. The promising experimental results are also shown.
Siya BAO Tomoyuki NITTA Masao YANAGISAWA Nozomu TOGAWA
In this paper, we propose a safe and comprehensive route finding algorithm for pedestrians based on lighting and landmark conditions. Safety and comprehensiveness can be predicted by the five possible indicators: (1) lighting conditions, (2) landmark visibility, (3) landmark effectiveness, (4) turning counts along a route, and (5) road widths. We first investigate impacts of these five indicators on pedestrians' perceptions on safety and comprehensiveness during route findings. After that, a route finding algorithm is proposed for pedestrians. In the algorithm, we design the score based on the indicators (1), (2), (3), and (5) above and also introduce a turning count reduction strategy for the indicator (4). Thus we find out a safe and comprehensive route through them. In particular, we design daytime score and nighttime score differently and find out an appropriate route depending on the time periods. Experimental simulation results demonstrate that the proposed algorithm obtains higher scores compared to several existing algorithms. We also demonstrate that the proposed algorithm is able to find out safe and comprehensive routes for pedestrians in real environments in accordance with questionnaire results.
Nozomu TOGAWA Takao TOTSUKA Tatsuhiko WAKUI Masao YANAGISAWA Tatsuo OHTSUKI
Content addressable memory (CAM) is one of the functional memories which realize word-parallel equivalence search. Since a CAM unit is generally used in a particular application program, we consider that appropriate design for CAM units is required depending on the requirements for the application program. This paper proposes a hardware/software cosynthesis system for CAM processors. The input of the system is an application program written in C including CAM functions and a constraint for execution time (or CAM processor area). Its output is hardware descriptions of a synthesized processor and a binary code executed on it. Based on the branch-and-bound method, the system determines which CAM function is realized by a hardware and which CAM function is realized by a software with meeting the given timing constraint (or area constraint) and minimizing the CAM processor area (or execution time of the application program). We expect that we can realize optimal CAM processor design for an application program. Experimental results for several application programs show that we can obtain a CAM processor whose area is minimum with meeting the given timing constraint.
Nozomu TOGAWA Masayuki IENAGA Masao YANAGISAWA Tatsuo OHTSUKI
This paper proposes an area/time optimizing algorithm in a high-level synthesis system for control-based hardwares. Given a call graph whose node corresponds to a control flow of an application program, the algorithm generates a set of state-transition graphs which represents the input call graph under area and timing constraint. In the algorithm, first state-transition graphs which satisfy only timing constraint are generated and second they are transformed so that they can satisfy area constraint. Since the algorithm is directly applied to control-flow graphs, it can deal with control flows such as bit-wise processes and conditional branches. Further, the algorithm synthesizes more than one hardware architecture candidates from a single call graph for an application program. Designers of an application program can select several good hardware architectures among candidates depending on multiple design criteria. Experimental results for several control-based hardwares demonstrate effectiveness and efficiency of the algorithm.
Suguru ARIMOTO Masao YANAGISAWA
Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
This paper proposes a hardware/software cosynthesis system for digital signal processor cores and a hardware/software partitioning algorithm which is one of the key issues for the system. The target processor has a VLIW-type core which can be composed of a processor kernel, multiple data memory buses (X-bus and Y-bus), hardware loop units, addressing units, and multiple functional units. The processor kernel includes five pipeline stages (RISC-type kernel) or three pipeline stages (DSP-type kernel). Given an application program written in the C language and a set of application data, the system synthesizes a processor core by selecting an appropriate kernel (RISC-type or DSP-type kernel) and required hardware units according to the application program/data and the hardware costs. The system also generates the object code for the application program and a software environment (compiler and simulator) for the processor core. The experimental results demonstrate that the system synthesizes processor cores effectively according to the features of an application program and the synthesized processor cores execute most application programs with the minimum number of clock cycles compared with several existing processors.
Koichi FUJIWARA Kazushi KAWAMURA Masao YANAGISAWA Nozomu TOGAWA
Recently, high-level synthesis techniques for FPGA designs (FPGA-HLS techniques) are strongly required in various applications. Both interconnection delays and clock skews have a large impact on circuit performance implemented onto FPGA, which indicates the need for floorplan-driven FPGA-HLS algorithms considering them. To appropriately estimate interconnection delays and clock skews at HLS phase, a reasonable model to estimate them becomes essential. In this paper, we demonstrate several experiments to characterize interconnection delays and clock skews in FPGA and propose novel estimate models called “IDEF” and “CSEF”. In order to evaluate our models, we integrate them into a conventional floorplan-driven FPGA-HLS algorithm. Experimental results demonstrate that our algorithm can realize FPGA designs which reduce the latency by up to 22% compared with conventional approaches.
Akira OHCHI Nozomu TOGAWA Masao YANAGISAWA Tatsuo OHTSUKI
As device feature size decreases, interconnection delay becomes the dominating factor of circuit total delay. Distributed-register architectures can reduce the influence of interconnection delay. They may, however, increase circuit area because they require many local registers. Moreover original distributed-register architectures do not consider control signal delay, which may be the bottleneck in a circuit. In this paper, we propose a high-level synthesis method targeting generalized distributed-register architecture in which we introduce shared/local registers and global/local controllers. Our method is based on iterative improvement of scheduling/binding and floorplanning. First, we prepare shared-register groups with global controllers, each of which corresponds to a single functional unit. As iterations proceed, we use local registers and local controllers for functional units on a critical path. Shared-register groups physically located close to each other are merged into a single group. Accordingly, global controllers are merged. Finally, our method obtains a generalized distributed-register architecture where its scheduling/binding as well as floorplanning are simultaneously optimized. Experimental results show that the area is decreased by 4.7% while maintaining the performance of the circuit equal with that using original distributed-register architectures.
Nozomu TOGAWA Koichi TACHIKAKE Yuichiro MIYAOKA Masao YANAGISAWA Tatsuo OHTSUKI
This paper focuses on SIMD processor synthesis and proposes a SIMD instruction set/functional unit synthesis algorithm. Given an initial assembly code and a timing constraint, the proposed algorithm synthesizes an area-optimized processor core with optimal SIMD functional units. It also synthesizes a SIMD instruction set. The input initial assembly code is assumed to run on a full-resource SIMD processor (virtual processor) which has all the possible SIMD functional units. In our algorithm, we introduce the SIMD operation decomposition and apply it to the initial assembly code and the full-resource SIMD processor. By gradually reducing SIMD operations or decomposing SIMD operations, we can finally find a processor core with small area under the given timing constraint. The promising experimental results are also shown.
Ken HAYAMIZU Nozomu TOGAWA Masao YANAGISAWA Youhua SHI
Approximate computing is a promising solution for future energy-efficient designs because it can provide great improvements in performance, area and/or energy consumption over traditional exact-computing designs for non-critical error-tolerant applications. However, the most challenging issue in designing approximate circuits is how to guarantee the pre-specified computation accuracy while achieving energy reduction and performance improvement. To address this problem, this paper starts from the state-of-the-art general approximate adder model (GeAr) and extends it for more possible approximate design candidates by relaxing the design restrictions. And then a maximum-error-distance-based performance/accuracy formulation, which can be used to select the performance/energy-accuracy optimal design from the extended design space, is proposed. Our evaluation results show the effectiveness of the proposed method in terms of area overhead, performance, energy consumption, and computation accuracy.
Kento HASEGAWA Masao YANAGISAWA Nozomu TOGAWA
Due to the increase of outsourcing by IC vendors, we face a serious risk that malicious third-party vendors insert hardware Trojans very easily into their IC products. However, detecting hardware Trojans is very difficult because today's ICs are huge and complex. In this paper, we propose a hardware-Trojan classification method for gate-level netlists to identify hardware-Trojan infected nets (or Trojan nets) using a support vector machine (SVM) or a neural network (NN). At first, we extract the five hardware-Trojan features from each net in a netlist. These feature values are complicated so that we cannot give the simple and fixed threshold values to them. Hence we secondly represent them to be a five-dimensional vector and learn them by using SVM or NN. Finally, we can successfully classify all the nets in an unknown netlist into Trojan ones and normal ones based on the learned classifiers. We have applied our machine-learning-based hardware-Trojan classification method to Trust-HUB benchmarks. The results demonstrate that our method increases the true positive rate compared to the existing state-of-the-art results in most of the cases. In some cases, our method can achieve the true positive rate of 100%, which shows that all the Trojan nets in an unknown netlist are completely detected by our method.
Mikiko SODE TANAKA Nozomu TOGAWA Masao YANAGISAWA Satoshi GOTO
With the progress of process technology in recent years, low voltage power supplies have become quite predominant. With this, the voltage margin has decreased and therefore the on-chip decoupling capacitance optimization that satisfies the voltage drop constraint becomes more important. In addition, the reduction of the on-chip decoupling capacitance area will reduce the chip area and, therefore, manufacturing costs. Hence, we propose an algorithm that satisfies the voltage drop constraint and at the same time, minimizes the total on-chip decoupling capacitance area. The proposed algorithm uses the idea of the network algorithm where the path which has the most influence on voltage drop is found. Voltage drop is improved by adding the on-chip capacitance to the node on the path. The proposed algorithm is efficient and effectively adds the on-chip capacitance to the greatest influence on the voltage drop. Experimental results demonstrate that, with the proposed algorithm, real size power/ground network could be optimized in just a few minutes which are quite practical. Compared with the conventional algorithm, we confirmed that the total on-chip decoupling capacitance area of the power/ground network was reducible by about 4050%.